Search CORE

328 research outputs found

Social Ranking Techniques for the Web

Author: Banerjee A.
Brin S.
Easley D.
Kleinberg J.
Lempel R.
Milgram S.
Mislove A.
Publication venue
Publication date: 26/06/2013
Field of study

The proliferation of social media has the potential for changing the structure and organization of the web. In the past, scientists have looked at the web as a large connected component to understand how the topology of hyperlinks correlates with the quality of information contained in the page and they proposed techniques to rank information contained in web pages. We argue that information from web pages and network data on social relationships can be combined to create a personalized and socially connected web. In this paper, we look at the web as a composition of two networks, one consisting of information in web pages and the other of personal data shared on social media web sites. Together, they allow us to analyze how social media tunnels the flow of information from person to person and how to use the structure of the social network to rank, deliver, and organize information specifically for each individual user. We validate our social ranking concepts through a ranking experiment conducted on web pages that users shared on Google Buzz and Twitter.Comment: 7 pages, ASONAM 201

arXiv.org e-Print Archive

Crossref

Optimizing XML Compression

Author: A. Lempel
J. Ziv
P. Skibinski
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The eXtensible Markup Language (XML) provides a powerful and flexible means of encoding and exchanging data. As it turns out, its main advantage as an encoding format (namely, its requirement that all open and close markup tags are present and properly balanced) yield also one of its main disadvantages: verbosity. XML-conscious compression techniques seek to overcome this drawback. Many of these techniques first separate XML structure from the document content, and then compress each independently. Further compression gains can be realized by identifying and compressing together document content that is highly similar, thereby amortizing the storage costs of auxiliary information required by the chosen compression algorithm. Additionally, the proper choice of compression algorithm is an important factor not only for the achievable compression gain, but also for access performance. Hence, choosing a compression configuration that optimizes compression gain requires one to determine (1) a partitioning strategy for document content, and (2) the best available compression algorithm to apply to each set within this partition. In this paper, we show that finding an optimal compression configuration with respect to compression gain is an NP-hard optimization problem. This problem remains intractable even if one considers a single compression algorithm for all content. We also describe an approximation algorithm for selecting a partitioning strategy for document content based on the branch-and-bound paradigm.Comment: 16 pages, extended version of paper accepted for XSym 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Hierarchical Context-based Pixel Ordering

Author: Bar-Joseph Z.
De Bonet J.S.
Fisher Y.
Lempel A.
Lempel A.
Stollnitz Eric J.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Factorization in Formal Languages

Author: A Lempel
A Restivo
A Weber
F Blanchet-Sadri
F Burderi
G-Q Zhang
N Immerman
N Rampersad
R Szelepcsényi
T Head
Publication venue
Publication date: 01/01/2015
Field of study

We consider several novel aspects of unique factorization in formal languages. We reprove the familiar fact that the set uf(L) of words having unique factorization into elements of L is regular if L is regular, and from this deduce an quadratic upper and lower bound on the length of the shortest word not in uf(L). We observe that uf(L) need not be context-free if L is context-free. Next, we consider variations on unique factorization. We define a notion of "semi-unique" factorization, where every factorization has the same number of terms, and show that, if L is regular or even finite, the set of words having such a factorization need not be context-free. Finally, we consider additional variations, such as unique factorization "up to permutation" and "up to subset"

arXiv.org e-Print Archive

Crossref

Loughborough University Institutional Repository

Patterns of Individual Shopping Behavior

Author: A Dixit
A Lempel
AM Muniz Jr
C Song
D Brockmann
F Bartumeus
G Viswanathan
I Ajzen
JP Bagrow
M Heisenberg
MC Gonzalez
N Eagle
P Nelson
R Doyle
S Iyengar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/08/2010
Field of study

Much of economic theory is built on observations of aggregate, rather than individual, behavior. Here, we present novel findings on human shopping patterns at the resolution of a single purchase. Our results suggest that much of our seemingly elective activity is actually driven by simple routines. While the interleaving of shopping events creates randomness at the small scale, on the whole consumer behavior is largely predictable. We also examine income-dependent differences in how people shop, and find that wealthy individuals are more likely to bundle shopping trips. These results validate previous work on mobility from cell phone data, while describing the unpredictability of behavior at higher resolution.Comment: 4 pages, 5 figure

arXiv.org e-Print Archive

DSpace@MIT

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

PubMed Central

Universidad Carlos III de Madrid e-Archivo

Composite repetition-aware data structures

Author: A Blumer
A Lempel
D Arroyuelo
D Belazzougui
DE Willard
J Radoszewski
J Sirén
J Ziv
M Crochemore
M Crochemore
M Raffinot
P Ferragina
S Kreft
T Gagie
V Mäkinen
V Mäkinen
W Rytter
Publication venue
Publication date: 01/01/2015
Field of study

In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

A New View on Worst-Case to Average-Case Reductions for NP Problems

Author: A. Bogdanov
A. Bogdanov
A. Lempel
C.-K. Yap
D. Gutfreund
D. Micciancio
D. Micciancio
G. Brassard
J. Feigenbaum
M. Blum
M. Blum
M. Sudan
S. Ben-David
S. Even
T. Watson
V. Lyubashevsky
W. Aiello
W. Diffie
Publication venue
Publication date: 01/01/2014
Field of study

We study the result by Bogdanov and Trevisan (FOCS, 2003), who show that under reasonable assumptions, there is no non-adaptive worst-case to average-case reduction that bases the average-case hardness of an NP-problem on the worst-case complexity of an NP-complete problem. We replace the hiding and the heavy samples protocol in [BT03] by employing the histogram verification protocol of Haitner, Mahmoody and Xiao (CCC, 2010), which proves to be very useful in this context. Once the histogram is verified, our hiding protocol is directly public-coin, whereas the intuition behind the original protocol inherently relies on private coins

arXiv.org e-Print Archive

Crossref

Dictionary-based methods for information extraction

Author: A. Baronchelli
Benedetto
Bennett
E. Caglioti
E. Pizzi
Lempel
Li
Li
Puglisi
Shannon
V. Loreto
Wyner
Ziv
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

In this paper, we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called dictionary of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from e.g. DNA strings. We then describe a procedure of string comparison between dictionary-created sequences (or artificial texts) that gives very good results in several contexts. We finally present some results on self-consistent classification problems

arXiv.org e-Print Archive

City Research Online

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Features of the Extension of a Statistical Measure of Complexity to Continuous Systems

Author: A. Lempel
A. N. Kolmogorov
A. Wehrl
B. A. Huberman
C. Adami
C. Anteneodo
C. Tsallis
D. P. Feldman
G. Chaitin
J. P. Crutchfield
J. S. Shiner
José Garay
P. Grassberger
R. López-Ruiz
R. López-Ruiz
Raquel G. Catalán
Ricardo López-Ruiz
S. Lloyd
X. Calbet
Y. ZuGuo
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2002
Field of study

We discuss some aspects of the extension to continuous systems of a statistical measure of complexity introduced by Lopez-Ruiz, Mancini and Calbet (LMC) [Phys. Lett. A 209 (1995) 321]. In general, the extension of a magnitude from the discrete to the continuous case is not a trivial process and requires some choice. In the present study, several possibilities appear available. One of them is examined in detail. Some interesting properties desirable for any magnitude of complexity are discovered on this particular extension.Comment: 22 pages, 0 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

New method for analysis of nonstationary signals

Author: A Lempel
AL Goldberger
B Cohen
JJ Żebrowski
K Hu
M Lehrman
P Graben
Robert A Stepien
XS Zhang
Z Chen
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Crossref

Springer - Publisher Connector

PubMed Central